Skyline Operator on Anti-correlated Distributions

نویسندگان

  • Haichuan Shang
  • Masaru Kitsuregawa
چکیده

Finding the skyline in a multi-dimensional space is relevant to a wide range of applications. The skyline operator over a set of d-dimensional points selects the points that are not dominated by any other point on all dimensions. Therefore, it provides a minimal set of candidates for the users to make their personal trade-off among all optimal solutions. The existing algorithms establish both the worst case complexity by discarding distributions and the average case complexity by assuming dimensional independence. However, the data in the real world is more likely to be anti-correlated. The cardinality and complexity analysis on dimensionally independent data is meaningless when dealing with anticorrelated data. Furthermore, the performance of the existing algorithms becomes impractical on anti-correlated data. In this paper, we establish a cardinality model for anticorrelated distributions. We propose an accurate polynomial estimation for the expected value of the skyline cardinality. Because the high skyline cardinality downgrades the performance of most existing algorithms on anti-correlated data, we further develop a determination and elimination framework which extends the well-adopted elimination strategy. It achieves remarkable effectiveness and efficiency. The comprehensive experiments on both real datasets and benchmark synthetic datasets demonstrate that our approach significantly outperforms the state-of-the-art algorithms under a wide range of settings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Skyline and Mapping Aware Query Evaluation Across Disparate Data Sources

Growing interests in multi-criteria decision support applications have resulted in a flurry of efficient skyline algorithms. In practice, real-world decision support applications require to access data from disparate sources. Existing techniques define the skyline operation to work on a single set, and therefore, treat skylines as an “add-on" on top of a traditional Select-Project-Join query pl...

متن کامل

Identifying Interesting Instances for Probabilistic Skylines

Uncertain data arises from various applications such as sensor networks, scientific data management, data integration, and location based applications. While significant research efforts have been dedicated to modeling, managing and querying uncertain data, advanced analysis of uncertain data is still in its early stages. In this paper, we focus on skyline analysis of uncertain data, modeled as...

متن کامل

Skyline with Presorting

The skyline, or Pareto, operator selects those tuples that are not dominated by any others. Extending relational systems with the skyline operator would offer a basis for handling preference queries. Good algorithms are needed for skyline, however, to make this efficient in a relational setting. We propose a skyline algorithm, SFS, based on presorting that is general, for use with any skyline q...

متن کامل

A fast and progressive algorithm for skyline queries with totally- and partially-ordered domains

We devise a skyline algorithm that can efficiently mitigate the enormous overhead of processing millions of tuples on totallyand partially-ordered domains (henceforth, TODs and PODs). With massive datasets, existing techniques spend a significant amount of time on a dominance comparison because of both a large number of skyline points and the unprogressive method of skyline computing with PODs....

متن کامل

Semi-Skyline Optimization of Constrained Skyline Queries

Skyline evaluation techniques (also known as Pareto preference queries) follow a common paradigm that eliminates data elements by finding other elements in a data set that dominate them. Nowadays already a variety of sophisticated skyline evaluation techniques are known, hence skylines are considered a well researched area. On the other hand, the skyline operator does not stand alone in databas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2013